home *** CD-ROM | disk | FTP | other *** search
Text File | 1993-10-26 | 6.4 KB | 135 lines | [TEXT/$Tcl] |
-
- regexp ?switches? exp string ?matchVar? ?subMatchVar sub-
- MatchVar ...?
-
-
- DESCRIPTION
- Determines whether the regular expression exp matches part
- or all of string and returns 1 if it does, 0 if it doesn't.
-
- If additional arguments are specified after string then they
- are treated as the names of variables in which to return
- information about which part(s) of string matched exp.
- MatchVar will be set to the range of string that matched all
- of exp. The first subMatchVar will contain the characters
- in string that matched the leftmost parenthesized subexpres-
- sion within exp, the next subMatchVar will contain the char-
- acters that matched the next parenthesized subexpression to
- the right in exp, and so on.
-
- If the initial arguments to regexp start with - then they
- are treated as switches. The following switches are
- currently supported:
-
- -nocase Causes upper-case characters in string to be
- treated as lower case during the matching process.
-
- -indices Changes what is stored in the subMatchVars.
- Instead of storing the matching characters from
- string, each variable will contain a list of two
- decimal strings giving the indices in string of
- the first and last characters in the matching
- range of characters.
-
- -- Marks the end of switches. The argument following
- this one will be treated as exp even if it starts
- with a -.
-
- If there are more subMatchVar's than parenthesized subex-
- pressions within exp, or if a particular subexpression in
- exp doesn't match the string (e.g. because it was in a por-
- tion of the expression that wasn't matched), then the
- corresponding subMatchVar will be set to ``-1 -1'' if
- -indices has been specified or to an empty string otherwise.
-
-
- REGULAR EXPRESSIONS
- Regular expressions are implemented using Henry Spencer's
- package (thanks, Henry!), and much of the description of
- regular expressions below is copied verbatim from his manual
- entry.
-
- A regular expression is zero or more branches, separated by
- ``|''. It matches anything that matches one of the
- branches.
-
- A branch is zero or more pieces, concatenated. It matches a
- match for the first, followed by a match for the second,
- etc.
-
- A piece is an atom possibly followed by ``*'', ``+'', or
- ``?''. An atom followed by ``*'' matches a sequence of 0 or
- more matches of the atom. An atom followed by ``+'' matches
- a sequence of 1 or more matches of the atom. An atom fol-
- lowed by ``?'' matches a match of the atom, or the null
- string.
-
- An atom is a regular expression in parentheses (matching a
- match for the regular expression), a range (see below),
- ``.'' (matching any single character), ``^'' (matching the
- null string at the beginning of the input string), ``$''
- (matching the null string at the end of the input string), a
- ``\'' followed by a single character (matching that charac-
- ter), or a single character with no other significance
- (matching that character).
-
- A range is a sequence of characters enclosed in ``[]''. It
- normally matches any single character from the sequence. If
- the sequence begins with ``^'', it matches any single char-
- acter not from the rest of the sequence. If two characters
- in the sequence are separated by ``-'', this is shorthand
- for the full list of ASCII characters between them (e.g.
- ``[0-9]'' matches any decimal digit). To include a literal
- ``]'' in the sequence, make it the first character (follow-
- ing a possible ``^''). To include a literal ``-'', make it
- the first or last character.
-
-
- CHOOSING AMONG ALTERNATIVE MATCHES
- In general there may be more than one way to match a regular
- expression to an input string. For example, consider the
- command
-
- regexp (a*)b* aabaaabb x y
-
- Considering only the rules given so far, x and y could end
- up with the values aabb and aa, aaab and aaa, ab and a, or
- any of several other combinations. To resolve this poten-
- tial ambiguity regexp chooses among alternatives using the
- rule ``first then longest''. In other words, it consders
- the possible matches in order working from left to right
- across the input string and the pattern, and it attempts to
- match longer pieces of the input string before shorter ones.
- More specifically, the following rules apply in decreasing
- order of priority:
-
- [1] If a regular expression could match two different parts
- of an input string then it will match the one that
- begins earliest.
-
- [2] If a regular expression contains | operators then the
- leftmost matching sub-expression is chosen.
-
- [3] In *, +, and ? constructs, longer matches are chosen in
- preference to shorter ones.
-
- [4] In sequences of expression components the components
- are considered from left to right.
-
- In the example from above, (a*)b* matches aab: the (a*)
- portion of the pattern is matched first and it consumes the
- leading aa; then the b* portion of the pattern consumes the
- next b. Or, consider the following example:
-
- regexp (ab|a)(b*)c abc x y z
- After this command x will be abc, y will be ab, and z will
- be an empty string. Rule 4 specifies that (ab|a) gets first
- shot at the input string and Rule 2 specifies that the ab
- sub-expression is checked before the a sub-expression. Thus
- the b has already been claimed before the (b*) component is
- checked and (b*) must match an empty string.
-
-
- KEYWORDS
- match, regular expression, string
-